Overview

Brought to you by YData

Dataset statistics

Number of variables18
Number of observations50000
Missing cells9625
Missing cells (%)1.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory26.3 MiB
Average record size in memory551.4 B

Variable types

Text2
Numeric11
Boolean1
Categorical4

Alerts

POSSIBLENterm has constant value "True" Constant
Insidesource has constant value "TMHMM2.0" Constant
TMhelixsource has constant value "TMHMM2.0" Constant
Outsidesource has constant value "TMHMM2.0" Constant
ExpnumberofAAsinTMHs is highly overall correlated with Insideend and 5 other fieldsHigh correlation
Insideend is highly overall correlated with ExpnumberofAAsinTMHs and 5 other fieldsHigh correlation
Insidestart is highly overall correlated with ExpnumberofAAsinTMHs and 4 other fieldsHigh correlation
Length is highly overall correlated with Insideend and 1 other fieldsHigh correlation
Outsideend is highly overall correlated with Length and 3 other fieldsHigh correlation
Outsidestart is highly overall correlated with ExpnumberofAAsinTMHs and 4 other fieldsHigh correlation
PredictedTMHsNumber is highly overall correlated with ExpnumberofAAsinTMHs and 5 other fieldsHigh correlation
TMhelixend is highly overall correlated with ExpnumberofAAsinTMHs and 6 other fieldsHigh correlation
TMhelixstart is highly overall correlated with ExpnumberofAAsinTMHs and 6 other fieldsHigh correlation
POSSIBLENterm has 9625 (19.2%) missing values Missing
Protein_ID has unique values Unique
Expnumberfirst60AAs has 2544 (5.1%) zeros Zeros

Reproduction

Analysis started2025-07-21 08:42:10.988390
Analysis finished2025-07-21 08:42:26.196952
Duration15.21 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

Distinct46765
Distinct (%)93.5%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
2025-07-21T10:42:26.261323image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length87
Median length86
Mean length23.76738
Min length6

Characters and Unicode

Total characters1188369
Distinct characters65
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique43801 ?
Unique (%)87.6%

Sample

1st rowMGV-GENOME-0377366
2nd rowMGV-GENOME-0228589
3rd rowTemPhD_cluster_54944
4th rowTemPhD_cluster_21940
5th rowuvig_280215
ValueCountFrequency (%)
uvig_82024 5
 
< 0.1%
uvig_26080 4
 
< 0.1%
mgv-genome-0372934 4
 
< 0.1%
mgv-genome-0379973 4
 
< 0.1%
uvig_183868 4
 
< 0.1%
uvig_134152 4
 
< 0.1%
uvig_186748 4
 
< 0.1%
temphd_cluster_6638 4
 
< 0.1%
mgv-genome-0379883 4
 
< 0.1%
mgv-genome-0379887 4
 
< 0.1%
Other values (46755) 49959
99.9%
2025-07-21T10:42:27.418234image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 114246
 
9.6%
1 61386
 
5.2%
0 52867
 
4.4%
3 50156
 
4.2%
2 49524
 
4.2%
E 41171
 
3.5%
4 40816
 
3.4%
5 39822
 
3.4%
M 37577
 
3.2%
7 37163
 
3.1%
Other values (55) 663641
55.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1188369
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
_ 114246
 
9.6%
1 61386
 
5.2%
0 52867
 
4.4%
3 50156
 
4.2%
2 49524
 
4.2%
E 41171
 
3.5%
4 40816
 
3.4%
5 39822
 
3.4%
M 37577
 
3.2%
7 37163
 
3.1%
Other values (55) 663641
55.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1188369
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
_ 114246
 
9.6%
1 61386
 
5.2%
0 52867
 
4.4%
3 50156
 
4.2%
2 49524
 
4.2%
E 41171
 
3.5%
4 40816
 
3.4%
5 39822
 
3.4%
M 37577
 
3.2%
7 37163
 
3.1%
Other values (55) 663641
55.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1188369
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
_ 114246
 
9.6%
1 61386
 
5.2%
0 52867
 
4.4%
3 50156
 
4.2%
2 49524
 
4.2%
E 41171
 
3.5%
4 40816
 
3.4%
5 39822
 
3.4%
M 37577
 
3.2%
7 37163
 
3.1%
Other values (55) 663641
55.8%

Protein_ID
Text

Unique 

Distinct50000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size4.4 MiB
2025-07-21T10:42:27.528990image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length90
Median length88
Mean length26.58144
Min length8

Characters and Unicode

Total characters1329072
Distinct characters65
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique50000 ?
Unique (%)100.0%

Sample

1st rowMGV-GENOME-0377366_94
2nd rowMGV-GENOME-0228589_3
3rd rowTemPhD_cluster_54944_50
4th rowTemPhD_cluster_21940_29
5th rowuvig_280215_16
ValueCountFrequency (%)
temphd_cluster_30092_4 1
 
< 0.1%
temphd_cluster_6461_44 1
 
< 0.1%
mgv-genome-0377366_94 1
 
< 0.1%
mgv-genome-0228589_3 1
 
< 0.1%
temphd_cluster_54944_50 1
 
< 0.1%
temphd_cluster_21940_29 1
 
< 0.1%
uvig_280215_16 1
 
< 0.1%
temphd_cluster_2820_6 1
 
< 0.1%
uvig_396803_67 1
 
< 0.1%
mgv-genome-0085121_16 1
 
< 0.1%
Other values (49990) 49990
> 99.9%
2025-07-21T10:42:27.757572image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 163061
 
12.3%
1 78268
 
5.9%
2 62660
 
4.7%
3 61828
 
4.7%
0 58559
 
4.4%
4 50588
 
3.8%
5 48220
 
3.6%
6 43743
 
3.3%
7 43737
 
3.3%
8 41403
 
3.1%
Other values (55) 677005
50.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1329072
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
_ 163061
 
12.3%
1 78268
 
5.9%
2 62660
 
4.7%
3 61828
 
4.7%
0 58559
 
4.4%
4 50588
 
3.8%
5 48220
 
3.6%
6 43743
 
3.3%
7 43737
 
3.3%
8 41403
 
3.1%
Other values (55) 677005
50.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1329072
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
_ 163061
 
12.3%
1 78268
 
5.9%
2 62660
 
4.7%
3 61828
 
4.7%
0 58559
 
4.4%
4 50588
 
3.8%
5 48220
 
3.6%
6 43743
 
3.3%
7 43737
 
3.3%
8 41403
 
3.1%
Other values (55) 677005
50.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1329072
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
_ 163061
 
12.3%
1 78268
 
5.9%
2 62660
 
4.7%
3 61828
 
4.7%
0 58559
 
4.4%
4 50588
 
3.8%
5 48220
 
3.6%
6 43743
 
3.3%
7 43737
 
3.3%
8 41403
 
3.1%
Other values (55) 677005
50.9%

Length
Real number (ℝ)

High correlation 

Distinct1656
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean220.27654
Minimum21
Maximum7694
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size781.2 KiB
2025-07-21T10:42:27.850664image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile46
Q181
median129
Q3217
95-th percentile793.05
Maximum7694
Range7673
Interquartile range (IQR)136

Descriptive statistics

Standard deviation293.61812
Coefficient of variation (CV)1.3329523
Kurtosis46.935516
Mean220.27654
Median Absolute Deviation (MAD)58
Skewness4.9451327
Sum11013827
Variance86211.603
MonotonicityNot monotonic
2025-07-21T10:42:27.960807image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
66 376
 
0.8%
68 375
 
0.8%
71 366
 
0.7%
60 358
 
0.7%
107 356
 
0.7%
93 332
 
0.7%
67 328
 
0.7%
74 327
 
0.7%
70 325
 
0.7%
116 323
 
0.6%
Other values (1646) 46534
93.1%
ValueCountFrequency (%)
21 1
 
< 0.1%
22 2
 
< 0.1%
23 1
 
< 0.1%
24 2
 
< 0.1%
25 6
 
< 0.1%
26 5
 
< 0.1%
27 6
 
< 0.1%
28 6
 
< 0.1%
29 92
0.2%
30 98
0.2%
ValueCountFrequency (%)
7694 1
< 0.1%
7324 1
< 0.1%
6121 1
< 0.1%
5721 1
< 0.1%
5089 1
< 0.1%
5055 1
< 0.1%
4711 1
< 0.1%
4439 1
< 0.1%
4421 1
< 0.1%
4289 1
< 0.1%

PredictedTMHsNumber
Real number (ℝ)

High correlation 

Distinct25
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.9024
Minimum1
Maximum26
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size781.2 KiB
2025-07-21T10:42:28.054048image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile5
Maximum26
Range25
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.904906
Coefficient of variation (CV)1.0013173
Kurtosis28.830601
Mean1.9024
Median Absolute Deviation (MAD)0
Skewness4.5453767
Sum95120
Variance3.6286668
MonotonicityNot monotonic
2025-07-21T10:42:28.144553image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
1 29439
58.9%
2 12185
24.4%
3 3685
 
7.4%
4 1937
 
3.9%
5 702
 
1.4%
6 611
 
1.2%
10 262
 
0.5%
7 242
 
0.5%
8 221
 
0.4%
12 155
 
0.3%
Other values (15) 561
 
1.1%
ValueCountFrequency (%)
1 29439
58.9%
2 12185
24.4%
3 3685
 
7.4%
4 1937
 
3.9%
5 702
 
1.4%
6 611
 
1.2%
7 242
 
0.5%
8 221
 
0.4%
9 138
 
0.3%
10 262
 
0.5%
ValueCountFrequency (%)
26 3
 
< 0.1%
25 1
 
< 0.1%
24 9
 
< 0.1%
22 8
 
< 0.1%
21 4
 
< 0.1%
20 25
0.1%
19 6
 
< 0.1%
18 43
0.1%
17 5
 
< 0.1%
16 53
0.1%

ExpnumberofAAsinTMHs
Real number (ℝ)

High correlation 

Distinct41719
Distinct (%)83.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42.077003
Minimum8.23423
Maximum577.09331
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size781.2 KiB
2025-07-21T10:42:28.239952image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum8.23423
5-th percentile17.47916
Q120.869492
median23.18689
Q344.360855
95-th percentile112.93182
Maximum577.09331
Range568.85908
Interquartile range (IQR)23.491363

Descriptive statistics

Standard deviation44.015965
Coefficient of variation (CV)1.0460813
Kurtosis28.205171
Mean42.077003
Median Absolute Deviation (MAD)5.50395
Skewness4.4694202
Sum2103850.2
Variance1937.4051
MonotonicityNot monotonic
2025-07-21T10:42:28.335523image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24.87583 69
 
0.1%
18.23661 66
 
0.1%
36.04048 51
 
0.1%
210.43458 48
 
0.1%
47.86547 42
 
0.1%
108.33627 33
 
0.1%
44.96264 30
 
0.1%
20.67098 30
 
0.1%
71.27387 27
 
0.1%
46.4987 26
 
0.1%
Other values (41709) 49578
99.2%
ValueCountFrequency (%)
8.23423 1
< 0.1%
8.98159 1
< 0.1%
9.06624 1
< 0.1%
9.2583 1
< 0.1%
9.44646 1
< 0.1%
9.6953 1
< 0.1%
9.72277 1
< 0.1%
9.89254 1
< 0.1%
9.90831 1
< 0.1%
10.10174 1
< 0.1%
ValueCountFrequency (%)
577.09331 1
< 0.1%
572.18767 1
< 0.1%
561.6315 1
< 0.1%
558.81059 1
< 0.1%
556.48551 1
< 0.1%
554.55197 1
< 0.1%
551.34156 1
< 0.1%
550.00044 1
< 0.1%
546.57729 1
< 0.1%
543.21233 1
< 0.1%

Expnumberfirst60AAs
Real number (ℝ)

Zeros 

Distinct37247
Distinct (%)74.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.538714
Minimum0
Maximum49.32205
Zeros2544
Zeros (%)5.1%
Negative0
Negative (%)0.0%
Memory size781.2 KiB
2025-07-21T10:42:28.424356image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q116.485703
median21.202375
Q325.194865
95-th percentile41.458534
Maximum49.32205
Range49.32205
Interquartile range (IQR)8.7091625

Descriptive statistics

Standard deviation12.199837
Coefficient of variation (CV)0.59399225
Kurtosis-0.45601502
Mean20.538714
Median Absolute Deviation (MAD)4.46552
Skewness-0.10608653
Sum1026935.7
Variance148.83602
MonotonicityNot monotonic
2025-07-21T10:42:28.514628image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2544
 
5.1%
0.00018 92
 
0.2%
42.15085 73
 
0.1%
24.87583 69
 
0.1%
18.23661 66
 
0.1%
1 × 10-553
 
0.1%
0.0002 52
 
0.1%
36.04048 51
 
0.1%
0.00019 48
 
0.1%
0.00015 46
 
0.1%
Other values (37237) 46906
93.8%
ValueCountFrequency (%)
0 2544
5.1%
1 × 10-553
 
0.1%
2 × 10-529
 
0.1%
3 × 10-523
 
< 0.1%
4 × 10-516
 
< 0.1%
5 × 10-510
 
< 0.1%
6 × 10-524
 
< 0.1%
7 × 10-512
 
< 0.1%
8 × 10-526
 
0.1%
9 × 10-515
 
< 0.1%
ValueCountFrequency (%)
49.32205 1
< 0.1%
49.17952 1
< 0.1%
47.95518 1
< 0.1%
47.95169 2
< 0.1%
47.80002 1
< 0.1%
47.8 1
< 0.1%
47.7514 1
< 0.1%
47.69621 1
< 0.1%
47.67108 1
< 0.1%
47.47652 1
< 0.1%

TotalprobofNin
Real number (ℝ)

Distinct30669
Distinct (%)61.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.58906106
Minimum0
Maximum1
Zeros4
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size781.2 KiB
2025-07-21T10:42:28.601211image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.021739
Q10.23569
median0.691315
Q30.925905
95-th percentile0.9961705
Maximum1
Range1
Interquartile range (IQR)0.690215

Descriptive statistics

Standard deviation0.35347854
Coefficient of variation (CV)0.60007114
Kurtosis-1.4075532
Mean0.58906106
Median Absolute Deviation (MAD)0.28049
Skewness-0.37670343
Sum29453.053
Variance0.12494708
MonotonicityNot monotonic
2025-07-21T10:42:28.695130image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.99854 79
 
0.2%
0.56701 69
 
0.1%
0.59265 66
 
0.1%
0.03881 53
 
0.1%
0.86194 51
 
0.1%
0.28216 42
 
0.1%
0.99602 40
 
0.1%
0.95017 35
 
0.1%
0.97286 33
 
0.1%
0.99959 32
 
0.1%
Other values (30659) 49500
99.0%
ValueCountFrequency (%)
0 4
 
< 0.1%
1 × 10-53
 
< 0.1%
2 × 10-55
< 0.1%
3 × 10-52
 
< 0.1%
4 × 10-510
< 0.1%
5 × 10-56
< 0.1%
6 × 10-511
< 0.1%
7 × 10-52
 
< 0.1%
8 × 10-53
 
< 0.1%
9 × 10-51
 
< 0.1%
ValueCountFrequency (%)
1 12
< 0.1%
0.99999 19
< 0.1%
0.99998 27
0.1%
0.99997 11
< 0.1%
0.99996 16
< 0.1%
0.99995 22
< 0.1%
0.99994 13
< 0.1%
0.99993 15
< 0.1%
0.99992 11
< 0.1%
0.99991 13
< 0.1%

POSSIBLENterm
Boolean

Constant  Missing 

Distinct1
Distinct (%)< 0.1%
Missing9625
Missing (%)19.2%
Memory size2.1 MiB
True
40375 
(Missing)
9625 
ValueCountFrequency (%)
True 40375
80.8%
(Missing) 9625
 
19.2%
2025-07-21T10:42:28.957247image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Insidesource
Categorical

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.5 MiB
TMHMM2.0
50000 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters400000
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTMHMM2.0
2nd rowTMHMM2.0
3rd rowTMHMM2.0
4th rowTMHMM2.0
5th rowTMHMM2.0

Common Values

ValueCountFrequency (%)
TMHMM2.0 50000
100.0%

Length

2025-07-21T10:42:29.024770image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-21T10:42:29.086721image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
tmhmm2.0 50000
100.0%

Most occurring characters

ValueCountFrequency (%)
M 150000
37.5%
T 50000
 
12.5%
H 50000
 
12.5%
2 50000
 
12.5%
. 50000
 
12.5%
0 50000
 
12.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 400000
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
M 150000
37.5%
T 50000
 
12.5%
H 50000
 
12.5%
2 50000
 
12.5%
. 50000
 
12.5%
0 50000
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 400000
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
M 150000
37.5%
T 50000
 
12.5%
H 50000
 
12.5%
2 50000
 
12.5%
. 50000
 
12.5%
0 50000
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 400000
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
M 150000
37.5%
T 50000
 
12.5%
H 50000
 
12.5%
2 50000
 
12.5%
. 50000
 
12.5%
0 50000
 
12.5%

Insidestart
Real number (ℝ)

High correlation 

Distinct1160
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean93.70036
Minimum1
Maximum7674
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size781.2 KiB
2025-07-21T10:42:29.160715image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median33
Q386
95-th percentile468
Maximum7674
Range7673
Interquartile range (IQR)85

Descriptive statistics

Standard deviation196.1528
Coefficient of variation (CV)2.093405
Kurtosis126.73208
Mean93.70036
Median Absolute Deviation (MAD)32
Skewness7.025596
Sum4685018
Variance38475.921
MonotonicityNot monotonic
2025-07-21T10:42:29.267063image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 16488
33.0%
27 1770
 
3.5%
28 1641
 
3.3%
33 1366
 
2.7%
38 1120
 
2.2%
24 1071
 
2.1%
22 831
 
1.7%
25 659
 
1.3%
43 604
 
1.2%
23 487
 
1.0%
Other values (1150) 23963
47.9%
ValueCountFrequency (%)
1 16488
33.0%
19 15
 
< 0.1%
20 27
 
0.1%
21 8
 
< 0.1%
22 831
 
1.7%
23 487
 
1.0%
24 1071
 
2.1%
25 659
 
1.3%
26 177
 
0.4%
27 1770
 
3.5%
ValueCountFrequency (%)
7674 1
< 0.1%
7304 1
< 0.1%
5139 1
< 0.1%
4667 1
< 0.1%
3783 1
< 0.1%
3223 2
< 0.1%
3210 1
< 0.1%
3129 1
< 0.1%
3000 1
< 0.1%
2852 1
< 0.1%

Insideend
Real number (ℝ)

High correlation 

Distinct1254
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean140.22592
Minimum1
Maximum7694
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size781.2 KiB
2025-07-21T10:42:29.372535image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q136
median86
Q3153
95-th percentile522
Maximum7694
Range7693
Interquartile range (IQR)117

Descriptive statistics

Standard deviation209.16529
Coefficient of variation (CV)1.4916307
Kurtosis100.12904
Mean140.22592
Median Absolute Deviation (MAD)56
Skewness6.1729031
Sum7011296
Variance43750.117
MonotonicityNot monotonic
2025-07-21T10:42:29.467672image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6 3768
 
7.5%
12 1354
 
2.7%
4 1323
 
2.6%
11 928
 
1.9%
20 858
 
1.7%
19 517
 
1.0%
8 460
 
0.9%
1 370
 
0.7%
67 353
 
0.7%
71 336
 
0.7%
Other values (1244) 39733
79.5%
ValueCountFrequency (%)
1 370
 
0.7%
2 52
 
0.1%
4 1323
 
2.6%
6 3768
7.5%
8 460
 
0.9%
10 35
 
0.1%
11 928
 
1.9%
12 1354
 
2.7%
15 65
 
0.1%
16 176
 
0.4%
ValueCountFrequency (%)
7694 1
< 0.1%
7324 1
< 0.1%
5144 1
< 0.1%
4826 1
< 0.1%
3789 1
< 0.1%
3582 2
< 0.1%
3488 1
< 0.1%
3221 1
< 0.1%
3101 1
< 0.1%
2861 1
< 0.1%

TMhelixsource
Categorical

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.5 MiB
TMHMM2.0
50000 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters400000
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTMHMM2.0
2nd rowTMHMM2.0
3rd rowTMHMM2.0
4th rowTMHMM2.0
5th rowTMHMM2.0

Common Values

ValueCountFrequency (%)
TMHMM2.0 50000
100.0%

Length

2025-07-21T10:42:29.555616image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-21T10:42:29.610654image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
tmhmm2.0 50000
100.0%

Most occurring characters

ValueCountFrequency (%)
M 150000
37.5%
T 50000
 
12.5%
H 50000
 
12.5%
2 50000
 
12.5%
. 50000
 
12.5%
0 50000
 
12.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 400000
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
M 150000
37.5%
T 50000
 
12.5%
H 50000
 
12.5%
2 50000
 
12.5%
. 50000
 
12.5%
0 50000
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 400000
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
M 150000
37.5%
T 50000
 
12.5%
H 50000
 
12.5%
2 50000
 
12.5%
. 50000
 
12.5%
0 50000
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 400000
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
M 150000
37.5%
T 50000
 
12.5%
H 50000
 
12.5%
2 50000
 
12.5%
. 50000
 
12.5%
0 50000
 
12.5%

TMhelixstart
Real number (ℝ)

High correlation 

Distinct1178
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean97.82874
Minimum2
Maximum7651
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size781.2 KiB
2025-07-21T10:42:29.683004image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile4
Q110
median37
Q392
95-th percentile473
Maximum7651
Range7649
Interquartile range (IQR)82

Descriptive statistics

Standard deviation197.45472
Coefficient of variation (CV)2.0183712
Kurtosis123.04498
Mean97.82874
Median Absolute Deviation (MAD)30
Skewness6.932696
Sum4891437
Variance38988.365
MonotonicityNot monotonic
2025-07-21T10:42:29.787698image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7 3780
 
7.6%
5 3459
 
6.9%
4 3296
 
6.6%
10 1763
 
3.5%
13 1454
 
2.9%
15 1210
 
2.4%
20 1025
 
2.1%
12 932
 
1.9%
21 859
 
1.7%
39 605
 
1.2%
Other values (1168) 31617
63.2%
ValueCountFrequency (%)
2 370
 
0.7%
3 52
 
0.1%
4 3296
6.6%
5 3459
6.9%
6 269
 
0.5%
7 3780
7.6%
9 460
 
0.9%
10 1763
3.5%
11 87
 
0.2%
12 932
 
1.9%
ValueCountFrequency (%)
7651 1
< 0.1%
7281 1
< 0.1%
5145 1
< 0.1%
4827 1
< 0.1%
3763 1
< 0.1%
3222 1
< 0.1%
3200 2
< 0.1%
3106 1
< 0.1%
2977 1
< 0.1%
2829 1
< 0.1%

TMhelixend
Real number (ℝ)

High correlation 

Distinct1185
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean118.6662
Minimum18
Maximum7673
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size781.2 KiB
2025-07-21T10:42:29.886197image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile24
Q131
median58
Q3113
95-th percentile495
Maximum7673
Range7655
Interquartile range (IQR)82

Descriptive statistics

Standard deviation197.68297
Coefficient of variation (CV)1.6658743
Kurtosis122.5839
Mean118.6662
Median Absolute Deviation (MAD)31
Skewness6.9184816
Sum5933310
Variance39078.557
MonotonicityNot monotonic
2025-07-21T10:42:29.977200image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
29 2714
 
5.4%
26 2369
 
4.7%
27 2332
 
4.7%
24 1606
 
3.2%
32 1379
 
2.8%
35 1073
 
2.1%
23 932
 
1.9%
37 900
 
1.8%
34 842
 
1.7%
42 831
 
1.7%
Other values (1175) 35022
70.0%
ValueCountFrequency (%)
18 6
 
< 0.1%
19 62
 
0.1%
20 31
 
0.1%
21 698
 
1.4%
22 617
 
1.2%
23 932
 
1.9%
24 1606
3.2%
25 350
 
0.7%
26 2369
4.7%
27 2332
4.7%
ValueCountFrequency (%)
7673 1
< 0.1%
7303 1
< 0.1%
5167 1
< 0.1%
4849 1
< 0.1%
3782 1
< 0.1%
3244 1
< 0.1%
3222 2
< 0.1%
3128 1
< 0.1%
2999 1
< 0.1%
2851 1
< 0.1%

Outsidesource
Categorical

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.5 MiB
TMHMM2.0
50000 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters400000
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTMHMM2.0
2nd rowTMHMM2.0
3rd rowTMHMM2.0
4th rowTMHMM2.0
5th rowTMHMM2.0

Common Values

ValueCountFrequency (%)
TMHMM2.0 50000
100.0%

Length

2025-07-21T10:42:30.062040image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-21T10:42:30.116352image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
tmhmm2.0 50000
100.0%

Most occurring characters

ValueCountFrequency (%)
M 150000
37.5%
T 50000
 
12.5%
H 50000
 
12.5%
2 50000
 
12.5%
. 50000
 
12.5%
0 50000
 
12.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 400000
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
M 150000
37.5%
T 50000
 
12.5%
H 50000
 
12.5%
2 50000
 
12.5%
. 50000
 
12.5%
0 50000
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 400000
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
M 150000
37.5%
T 50000
 
12.5%
H 50000
 
12.5%
2 50000
 
12.5%
. 50000
 
12.5%
0 50000
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 400000
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
M 150000
37.5%
T 50000
 
12.5%
H 50000
 
12.5%
2 50000
 
12.5%
. 50000
 
12.5%
0 50000
 
12.5%

Outsidestart
Real number (ℝ)

High correlation 

Distinct1102
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean90.05776
Minimum1
Maximum5168
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size781.2 KiB
2025-07-21T10:42:30.184945image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median36
Q386
95-th percentile429
Maximum5168
Range5167
Interquartile range (IQR)85

Descriptive statistics

Standard deviation175.93883
Coefficient of variation (CV)1.9536221
Kurtosis53.157012
Mean90.05776
Median Absolute Deviation (MAD)35
Skewness5.2738131
Sum4502888
Variance30954.472
MonotonicityNot monotonic
2025-07-21T10:42:30.295546image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 12951
25.9%
30 3357
 
6.7%
25 1747
 
3.5%
36 1643
 
3.3%
27 1372
 
2.7%
28 1315
 
2.6%
44 1077
 
2.2%
35 1012
 
2.0%
32 854
 
1.7%
43 648
 
1.3%
Other values (1092) 24024
48.0%
ValueCountFrequency (%)
1 12951
25.9%
17 7
 
< 0.1%
18 4
 
< 0.1%
19 6
 
< 0.1%
20 202
 
0.4%
21 69
 
0.1%
22 179
 
0.4%
23 500
 
1.0%
24 114
 
0.2%
25 1747
 
3.5%
ValueCountFrequency (%)
5168 1
< 0.1%
4850 1
< 0.1%
3245 1
< 0.1%
2821 1
< 0.1%
2720 1
< 0.1%
2602 1
< 0.1%
2585 1
< 0.1%
2396 1
< 0.1%
2358 1
< 0.1%
2087 1
< 0.1%

Outsideend
Real number (ℝ)

High correlation 

Distinct1634
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean176.87936
Minimum3
Maximum7650
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size781.2 KiB
2025-07-21T10:42:30.380322image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile3
Q133
median81
Q3184
95-th percentile734
Maximum7650
Range7647
Interquartile range (IQR)151

Descriptive statistics

Standard deviation297.3179
Coefficient of variation (CV)1.6809078
Kurtosis45.684828
Mean176.87936
Median Absolute Deviation (MAD)62
Skewness4.8835224
Sum8843968
Variance88397.932
MonotonicityNot monotonic
2025-07-21T10:42:30.738752image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3 3296
 
6.6%
4 2136
 
4.3%
9 1763
 
3.5%
14 1210
 
2.4%
38 531
 
1.1%
19 508
 
1.0%
33 446
 
0.9%
32 383
 
0.8%
30 369
 
0.7%
39 367
 
0.7%
Other values (1624) 38991
78.0%
ValueCountFrequency (%)
3 3296
6.6%
4 2136
4.3%
5 269
 
0.5%
6 12
 
< 0.1%
9 1763
3.5%
10 52
 
0.1%
11 4
 
< 0.1%
12 100
 
0.2%
14 1210
 
2.4%
16 12
 
< 0.1%
ValueCountFrequency (%)
7650 1
< 0.1%
7280 1
< 0.1%
6121 1
< 0.1%
5721 1
< 0.1%
5089 1
< 0.1%
5055 1
< 0.1%
4711 1
< 0.1%
4439 1
< 0.1%
4421 1
< 0.1%
4289 1
< 0.1%

Phage_source
Categorical

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
MGV
14613 
GPD
13143 
TemPhD
7794 
GOV2
6868 
CHVD
3605 
Other values (8)
3977 

Length

Max length8
Median length3
Mean length3.82022
Min length3

Characters and Unicode

Total characters191011
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMGV
2nd rowMGV
3rd rowTemPhD
4th rowTemPhD
5th rowGPD

Common Values

ValueCountFrequency (%)
MGV 14613
29.2%
GPD 13143
26.3%
TemPhD 7794
15.6%
GOV2 6868
13.7%
CHVD 3605
 
7.2%
GVD 1400
 
2.8%
RefSeq 767
 
1.5%
IGVD 584
 
1.2%
PhagesDB 551
 
1.1%
Genbank 366
 
0.7%
Other values (3) 309
 
0.6%

Length

2025-07-21T10:42:30.872632image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mgv 14613
29.2%
gpd 13143
26.3%
temphd 7794
15.6%
gov2 6868
13.7%
chvd 3605
 
7.2%
gvd 1400
 
2.8%
refseq 767
 
1.5%
igvd 584
 
1.2%
phagesdb 551
 
1.1%
genbank 366
 
0.7%
Other values (3) 309
 
0.6%

Most occurring characters

ValueCountFrequency (%)
G 36974
19.4%
V 27327
14.3%
D 27131
14.2%
P 21488
11.2%
M 14638
 
7.7%
e 10245
 
5.4%
h 8345
 
4.4%
T 8051
 
4.2%
m 7794
 
4.1%
O 6868
 
3.6%
Other values (18) 22150
11.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 191011
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
G 36974
19.4%
V 27327
14.3%
D 27131
14.2%
P 21488
11.2%
M 14638
 
7.7%
e 10245
 
5.4%
h 8345
 
4.4%
T 8051
 
4.2%
m 7794
 
4.1%
O 6868
 
3.6%
Other values (18) 22150
11.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 191011
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
G 36974
19.4%
V 27327
14.3%
D 27131
14.2%
P 21488
11.2%
M 14638
 
7.7%
e 10245
 
5.4%
h 8345
 
4.4%
T 8051
 
4.2%
m 7794
 
4.1%
O 6868
 
3.6%
Other values (18) 22150
11.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 191011
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
G 36974
19.4%
V 27327
14.3%
D 27131
14.2%
P 21488
11.2%
M 14638
 
7.7%
e 10245
 
5.4%
h 8345
 
4.4%
T 8051
 
4.2%
m 7794
 
4.1%
O 6868
 
3.6%
Other values (18) 22150
11.6%

Interactions

2025-07-21T10:42:24.749953image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:14.363909image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:15.681053image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:16.644202image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:17.642270image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:18.765148image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:19.664824image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:20.597486image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:22.022271image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:22.890146image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:23.931286image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:24.840346image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:14.469326image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:15.777929image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:16.733528image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:17.723861image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:18.848656image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:19.745032image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:20.719440image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:22.108065image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:22.969197image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:24.005905image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:24.919788image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:14.557725image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:15.874150image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:16.820665image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:17.806055image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:18.922887image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:19.824567image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:20.817226image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:22.189011image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:23.046782image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:24.079468image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:24.997498image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:14.675423image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:15.957363image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:16.907696image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:17.893330image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:19.007684image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:19.902958image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:20.906962image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:22.266991image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:23.124243image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:24.149810image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:25.079351image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:14.760026image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:16.036730image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:16.993257image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:17.976552image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:19.088427image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:19.988054image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:21.011655image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:22.339847image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:23.197780image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:24.226478image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:25.155002image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:14.856584image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:16.118336image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:17.083310image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:18.054222image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:19.173228image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:20.069569image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:21.381066image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:22.422935image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:23.270199image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:24.296208image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:25.235377image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:14.987309image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:16.204465image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:17.173877image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:18.145871image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:19.255830image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:20.156082image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:21.493638image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:22.505684image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:23.353951image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:24.371952image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:25.312814image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:15.085027image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:16.299703image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:17.259557image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:18.228515image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:19.337005image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:20.243834image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:21.606102image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:22.580216image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:23.427701image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:24.451559image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:25.428525image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:15.190051image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:16.381958image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:17.344038image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:18.511515image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:19.415082image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:20.326759image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:21.698602image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:22.660409image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:23.504945image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:24.526523image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:25.515113image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:15.297096image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:16.475490image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:17.439058image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:18.600305image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:19.495134image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:20.417099image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:21.834171image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:22.738715image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:23.582951image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:24.601589image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:25.600856image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:15.388846image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:16.550949image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:17.525515image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:18.675189image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:19.587182image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:20.495278image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:21.937155image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:22.809561image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:23.666490image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:42:24.667298image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Correlations

2025-07-21T10:42:30.962876image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Expnumberfirst60AAsExpnumberofAAsinTMHsInsideendInsidestartLengthOutsideendOutsidestartPhage_sourcePredictedTMHsNumberTMhelixendTMhelixstartTotalprobofNin
Expnumberfirst60AAs1.0000.406-0.1680.129-0.354-0.330-0.1210.0490.355-0.145-0.1640.155
ExpnumberofAAsinTMHs0.4061.0000.5130.7310.2690.3230.5840.0500.8730.6930.6730.083
Insideend-0.1680.5131.0000.7820.5250.1220.2890.0240.5300.6660.659-0.259
Insidestart0.1290.7310.7821.0000.2910.1190.2510.0230.7810.6610.655-0.189
Length-0.3540.2690.5250.2911.0000.7370.4410.0220.2630.4860.491-0.013
Outsideend-0.3300.3230.1220.1190.7371.0000.7300.0180.3080.6110.6230.237
Outsidestart-0.1210.5840.2890.2510.4410.7301.0000.0280.6190.7620.7640.280
Phage_source0.0490.0500.0240.0230.0220.0180.0281.0000.0470.0230.0230.027
PredictedTMHsNumber0.3550.8730.5300.7810.2630.3080.6190.0471.0000.6760.6780.097
TMhelixend-0.1450.6930.6660.6610.4860.6110.7620.0230.6761.0000.9940.090
TMhelixstart-0.1640.6730.6590.6550.4910.6230.7640.0230.6780.9941.0000.099
TotalprobofNin0.1550.083-0.259-0.189-0.0130.2370.2800.0270.0970.0900.0991.000

Missing values

2025-07-21T10:42:25.781147image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
A simple visualization of nullity by column.
2025-07-21T10:42:26.024464image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Phage_IDProtein_IDLengthPredictedTMHsNumberExpnumberofAAsinTMHsExpnumberfirst60AAsTotalprobofNinPOSSIBLENtermInsidesourceInsidestartInsideendTMhelixsourceTMhelixstartTMhelixendOutsidesourceOutsidestartOutsideendPhage_source
1392473MGV-GENOME-0377366MGV-GENOME-0377366_94107239.8931439.893140.99747TrueTMHMM2.053.0107.0TMHMM2.033.052.0TMHMM2.030.032.0MGV
1225655MGV-GENOME-0228589MGV-GENOME-0228589_3204245.0347242.872730.99861TrueTMHMM2.062.0204.0TMHMM2.039.061.0TMHMM2.030.038.0MGV
2065853TemPhD_cluster_54944TemPhD_cluster_54944_50108122.1582121.194420.14212TrueTMHMM2.043.0108.0TMHMM2.024.042.0TMHMM2.01.023.0TemPhD
1828787TemPhD_cluster_21940TemPhD_cluster_21940_2941122.6078622.607860.35682TrueTMHMM2.038.041.0TMHMM2.015.037.0TMHMM2.01.014.0TemPhD
575773uvig_280215uvig_280215_16571122.922610.000000.91546NaNTMHMM2.01.0169.0TMHMM2.0170.0192.0TMHMM2.0193.0571.0GPD
1869556TemPhD_cluster_2820TemPhD_cluster_2820_666119.7698419.769560.91180TrueTMHMM2.01.06.0TMHMM2.07.026.0TMHMM2.027.066.0TemPhD
717310uvig_396803uvig_396803_67183118.6530218.454940.74857TrueTMHMM2.01.06.0TMHMM2.07.025.0TMHMM2.026.0183.0GPD
1193825MGV-GENOME-0085121MGV-GENOME-0085121_1698242.3129740.907270.98844TrueTMHMM2.062.098.0TMHMM2.044.061.0TMHMM2.035.043.0MGV
1524904MGV-GENOME-0378116MGV-GENOME-0378116_30122241.7841223.944020.91384TrueTMHMM2.081.0122.0TMHMM2.058.080.0TMHMM2.044.057.0MGV
1051224MGV-GENOME-0357329MGV-GENOME-0357329_12169364.4194727.065770.30090TrueTMHMM2.0119.0169.0TMHMM2.096.0118.0TMHMM2.093.095.0MGV
Phage_IDProtein_IDLengthPredictedTMHsNumberExpnumberofAAsinTMHsExpnumberfirst60AAsTotalprobofNinPOSSIBLENtermInsidesourceInsidestartInsideendTMhelixsourceTMhelixstartTMhelixendOutsidesourceOutsidestartOutsideendPhage_source
890899uvig_582989uvig_582989_7299244.4496830.896700.99829TrueTMHMM2.075.0299.0TMHMM2.055.074.0TMHMM2.036.054.0GPD
1770451TemPhD_cluster_13199TemPhD_cluster_13199_51117365.1017929.445520.99441TrueTMHMM2.075.093.0TMHMM2.094.0116.0TMHMM2.0117.0117.0TemPhD
128492Ma_2019_SRR413601_NODE_1094_length_12962_cov_2.987216Ma_2019_SRR413601_NODE_1094_length_12962_cov_2.987216_6152117.4756617.469960.16250TrueTMHMM2.029.0152.0TMHMM2.010.028.0TMHMM2.01.09.0GVD
434484uvig_176237uvig_176237_10224234.7560222.970460.50513TrueTMHMM2.033.052.0TMHMM2.053.075.0TMHMM2.076.0224.0GPD
809749uvig_492895uvig_492895_1091241.7066241.196100.99907TrueTMHMM2.059.091.0TMHMM2.041.058.0TMHMM2.027.040.0GPD
2251830SAMN05414905_a1_ct51712_vs1SAMN05414905_a1_ct51712_vs1_1111129.945839.534670.81494NaNTMHMM2.01.056.0TMHMM2.057.079.0TMHMM2.080.0111.0CHVD
24239NC_042134.1YP_009625909.167242.6280742.627700.98902TrueTMHMM2.056.067.0TMHMM2.033.055.0TMHMM2.030.032.0RefSeq
2734654Station102_MES_ALL_assembly_NODE_272_length_47729_cov_28.559886Station102_MES_ALL_assembly_NODE_272_length_47729_cov_28.559886_53121119.795740.000060.92093NaNTMHMM2.01.087.0TMHMM2.088.0110.0TMHMM2.0111.0121.0GOV2
1551990MGV-GENOME-0364654MGV-GENOME-0364654_31111117.6019617.598910.02257TrueTMHMM2.033.0111.0TMHMM2.015.032.0TMHMM2.01.014.0MGV
2152546TemPhD_cluster_6461TemPhD_cluster_6461_44225122.987980.000000.51573NaNTMHMM2.0184.0225.0TMHMM2.0161.0183.0TMHMM2.01.0160.0TemPhD